17 research outputs found

    Template Based Modeling and Structural Refinement of Protein-Protein Interactions.

    Full text link
    Determining protein structures from sequence is a fundamental problem in molecular biology, as protein structure is essential to understanding protein function. In this study, I developed one of the first fully automated pipelines for template based quaternary structure prediction starting from sequence. Two critical steps for template based modeling are identifying the correct homologous structures by threading which generates sequence to structure alignments and refining the initial threading template coordinates closer to the native conformation. I developed SPRING (single-chain-based prediction of interactions and geometries), a monomer threading to dimer template mapping program, which was compared to the dimer co-threading program, COTH, using 1838 non homologous target complex structures. SPRING’s similarity score outperformed COTH in the first place ranking of templates, correctly identifying 798 and 527 interfaces respectively. More importantly the results were found to be complementary and the programs could be combined in a consensus based threading program showing a 5.1% improvement compared to SPRING. Template based modeling requires a structural analog being present in the PDB. A full search of the PDB, using threading and structural alignment, revealed that only 48.7% of the PDB has a suitable template whereas only 39.4% of the PDB has templates that can be identified by threading. In order to circumvent this, I included intramolecular domain-domain interfaces into the PDB library to boost template recognition of protein dimers; the merging of the two classes of interfaces improved recognition of heterodimers by 40% using benchmark settings. Next the template based assembly of protein complexes pipeline, TACOS, was created. The pipeline combines threading templates and domain knowledge from the PDB into a knowledge based energy score. The energy score is integrated into a Monte Carlo sampling simulation that drives the initial template closer to the native topology. The full pipeline was benchmarked using 350 non homologous structures and compared to two state of the art programs for dimeric structure prediction: ZDOCK and MODELLER. On average, TACOS models global and interface structure have a better quality than the models generated by MODELLER and ZDOCK.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135847/1/bgovi_1.pd

    Templateâ based protein structure prediction in CASP11 and retrospect of Iâ TASSER in the last decade

    Full text link
    We report the structure prediction results of a new composite pipeline for templateâ based modeling (TBM) in the 11th CASP experiment. Starting from multiple structure templates identified by LOMETS based metaâ threading programs, the QUARK ab initio folding program is extended to generate initial fullâ length models under strong constraints from template alignments. The final atomic models are then constructed by Iâ TASSER based fragment reassembly simulations, followed by the fragmentâ guided molecular dynamic simulation and the MQAPâ based model selection. It was found that the inclusion of QUARKâ TBM simulations as an intermediate modeling step could help improve the quality of the Iâ TASSER models for both Easy and Hard TBM targets. Overall, the average TMâ score of the first Iâ TASSER model is 12% higher than that of the best LOMETS templates, with the RMSD in the same threadingâ aligned regions reduced from 5.8 to 4.7 à . Nevertheless, there are nearly 18% of TBM domains with the templates deteriorated by the structure assembly pipeline, which may be attributed to the errors of secondary structure and domain orientation predictions that propagate through and degrade the procedures of template identification and final model selections. To examine the record of progress, we made a retrospective report of the Iâ TASSER pipeline in the last five CASP experiments (CASP7â 11). The data show no clear progress of the LOMETS threading programs over PSIâ BLAST; but obvious progress on structural improvement relative to threading templates was witnessed in recent CASP experiments, which is probably attributed to the integration of the extended ab initio folding simulation with the threading assembly pipeline and the introduction of atomicâ level structure refinements following the reduced modeling simulations. Proteins 2016; 84(Suppl 1):233â 246. © 2015 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134137/1/prot24918.pd

    Extending Protein Domain Boundary Predictors to Detect Discontinuous Domains.

    No full text
    A variety of protein domain predictors were developed to predict protein domain boundaries in recent years, but most of them cannot predict discontinuous domains. Considering nearly 40% of multidomain proteins contain one or more discontinuous domains, we have developed DomEx to enable domain boundary predictors to detect discontinuous domains by assembling the continuous domain segments. Discontinuous domains are predicted by matching the sequence profile of concatenated continuous domain segments with the profiles from a single-domain library derived from SCOP and CATH, and Pfam. Then the matches are filtered by similarity to library templates, a symmetric index score and a profile-profile alignment score. DomEx recalled 32.3% discontinuous domains with 86.5% precision when tested on 97 non-homologous protein chains containing 58 continuous and 99 discontinuous domains, in which the predicted domain segments are within ±20 residues of the boundary definitions in CATH 3.5. Compared with our recently developed predictor, ThreaDom, which is the state-of-the-art tool to detect discontinuous-domains, DomEx recalled 26.7% discontinuous domains with 72.7% precision in a benchmark with 29 discontinuous-domain chains, where ThreaDom failed to predict any discontinuous domains. Furthermore, combined with ThreaDom, the method ranked number one among 10 predictors. The source code and datasets are available at https://github.com/xuezhidong/DomEx

    An illustration of the procedure to generate the samples.

    No full text
    <p>A 3-domain chain is defined as (A1A2)(B)(C). A1 and A2 form one structure domain, while B and C are independent domain, respectively. The (A1A1) is treated as “Positive” sample; (A1C) and (BC) as “Negative” and other combinations are ignored.</p

    The benchmark results of DomEx with domain boundary predictors.

    No full text
    <p>The methods with discontinuous domain detection are shown as dark bars.</p

    The recognition results of discontinuous domains at various TS-score and SI cutoffs.

    No full text
    <p>(A) MCC; (B) Recall; (C) Precision (5 parallel lines show the boundaries of the precision region at different b values). (D) The figure of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0141541#pone.0141541.e009" target="_blank">Eq 6</a>.</p

    The comparison of the templates from CATH+SCOP, Pfam and CATH+SCOP+PFAM.

    No full text
    <p>(A) Precision comparison; (B) The proportion of templates coming from CATH+SCOP and Pfam as parameter b varies.</p

    The training and validation results using <i>T</i><sub><i>SI</i></sub> = <i>f</i>(<i>T</i><sub><i>TS</i></sub>,<i>b</i>) constraints.

    No full text
    <p>The cutoff <i>T</i><sub><i>TS</i></sub> (for TS-score) and <i>T</i><sub><i>SI</i></sub> (for SI) in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0141541#pone.0141541.g005" target="_blank">Fig 5</a> are constrained by <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0141541#pone.0141541.e009" target="_blank">Eq 6</a>. (A) The results on the Training Dataset with parameter b from 0.9 to 0.1; (B) The results on the Validation Dataset with parameter b from 0.9 to 0.1, respectively.</p

    The cases of N- to C-termini assembly.

    No full text
    <p>(A) The 3D structure of PDB: 1axkB. The two segments of the discontinuous domain (1–156|342–393) are colored in magenta and lemon green, respectively. (B) The 3D structure of PDB:1u0aD. It is an AB type Segment-Swapping Domain. (C) The 3D structure of PDB:1cpmA. It is a BA type Segment-Swapping Domain.</p
    corecore